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ABSTRACT 



12 Statistical analyses of large surveys for transiting planets such as the Kepler 
mission must account for systematic errors and biases. Transit detection depends 
not only on the planet's radius and orbital period, but also on host star properties. 
Thus, a sample of stars with transiting planets may not accurately represent the 
target population. Moreover, targets are selected using criteria such as a limiting 
apparent magnitude. These selection effects, combined with uncertainties in 
stellar radius, lead to biases in the properties of transiting planets and their host 
stars. We quantify possible biases in the Kepler survey. First, Eddington bias 
produced by a steep planet radius distribution and uncertainties in stellar radius 
results in a 15-20% overestimate of planet occurrence. Second, the magnitude 
limit of the Kepler target catalog induces Malmquist bias towards large, more 
luminous stars and underestimation of the radii of about one third of candidate 
planets, especially those larger than Neptune. Third, because metal-poor stars 
are smaller, stars with detected planets will be very slightly (<0.02 dex) more 
metal-poor than the target average. Fourth, uncertainties in stellar radii produce 
correlated errors in planet radius and stellar irradiation. A previous finding, 
that highly-irradiated giant are more likely to have "inflated" radii, remains 
significant, even accounting for this effect. In contrast, transit depth is negatively 
correlated with stellar metallicity even in the absence of any intrinsic correlation, 
and a previous claim of a negative correlation between giant planet transit depth 
and stellar metallicity is probably an artifact. 

13 Subject headings: Planetary systems — Planets and satellites: detection — Stars: 

14 fundamental parameters — Methods: statistical 
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Introduction 



When an exoplanet's orbital plane lies along our line of sight, the planet will transit its 
host star, periodically obscuring a small portion of the stellar disk and producing detectable 
dips in a photometric lightcurve. The first transit s (of a planet previously disco vered by 



19 Doppler) were observed in 1999 ( iHenry et al. 



2000; 



Charbonneau fc Brown 



20001). The first 



exoplane t to be detected wit h the transit technique was confirmed by Doppler observations 



2i in 2002 ( jKonacki et al. 



20031 ) . As of 18 October 2012, 288 confirmed transiting planets in 



22 233 systems have been reported ([Schneider et al. 



20111 ) . The Kepler mission, operating 



23 since 2009, has ide ntified more t han 2300 candidate transiting planets (Kepler Objects of 



24 Interest or KOIs) (jBatalha 



20121 . hereafter B12). Although only a small fraction of 



KOIs 



25 have been confirmed , the fals e positive rate i s thou ght to be low (IMorton fc Johnson 



Lissauer et al. 



2011 



20121 ). but see 



Santerne et al. 



torn . 



Transit searches are more sensitive than Doppler searches to the smallest planets 
because the transit signal scales with the square of th e planet radius R p , w hile the Doppler 



29 signal of a rocky planet scales approximately as Rl (IValencia et al 



20071 ). Kepler has 



already discovered 90 candidates possibly smaller than Earth. Transiting planets are of 
special interest because their radii can be estimated from the transit signal. If the transit is 
not grazing, the fractional decrease 5 in the star's observed flux is 



8 



R p 
R* 



where R* is the stellar radius. A measurement of 5 combined with knowledge of R* yields 
the planet radius. Because the inclination of a transiting planet's orbit is near 90°, the 
mass of the planet can also be unambiguously established from Doppler observations. 
Combinati ons of mass and radius can be compared with predictions by models of interior 



structure ( ISeager et al. 



2007 



Grasset et al 



2009 



Rogers et al 



201lh . Spectroscopy or 



38 spectrophotometry during transits can detect or rule out constituents in an atmosphere 
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39 ( ICharbonneau et al.l |2002| ; iBean et al.l |2010| ; iDesert et al.l 1201 ll ) , an d secondary eclipses 



40 OCCU 



2005 



tations of the planet 



Knutson et al 



2008 



42 variation in transit times (lAgol et al. 



can constrain te mperature and albedo (ICharbonneau et al 



Rowe et a 



2005 



2008) . Additional p lanets can be discovered by 



Ford et al. 



201l|). 



43 Analyses of large samples of transiting planets, including the catalog of KOIs, have 

44 attempted to ascertain properti es of transiting plan et populations, e.g., whether they are 



45 segregated into discr ete groups (IFressin et al 



20091 ). the distribution with planet radius 



46 (IHoward et al.l l2012h the dependence of planet occurrenc e on the metallicity of the host 



47 star ( ISchlaufman fc Laug hlin 



on giant planet radius (iDemory fc Seagei 



2011 



Buchhave et al 



of planets compare d to Doppler surveys (jGaidos et al 



2011 



2012), the effect of stellar irradiance 



Enoch et a 



2012 



2012). and the occurrence 



Wright et al 



Wolfgang &: Laughlin 



2012 



20121 ). In the case of Kepler, lack of Doppler confirmation for most candidate 

51 planets as well as detailed spectroscopic characterization of the stars make it important to 

52 properly account for any systematic effects or biases. 

53 Detection of a planet in a transit survey depends on the properties of the planet, 

54 most notably R p (Equation [1]), but also on the orbital period b ecause i t dete r mines the 



55 numb er of t ransits that are ob served and the total transit signal. 



se (120051 ). and 



Pont et al. 



)6) 



Gaudil (120051 ) 



Gaudi et al. 



(120061 ) pointed out that transit-selected samples are biased toward 



57 large planets on short-period orbits. These biases can be extreme in ground-based surveys 

58 which suffer correlated ("red") noise from variations in atmospheric transmission and 

59 discontinuous observing windows. 

eo Equation [1] also shows that planets of a given radius will be more readily detected 



6i around smal 



dwarf stars (iTarter et al. 



er stars. This has , in part, motivated transit searches for planets ar ound M 



2007 



Gaidos et al. 



2007 



Charbonneau &: Deming 



20071 ). In this 



63 case, a property of the host star, as opposed to the planet, influences the likelihood that 
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64 a transiting planet will be detected, and that both star and planet will be included in a 

65 transit-selected sample. Thus a selection effect will act on stellar radius, or on any property 
ee that is related to stellar radius, such as metallicity. This will produce systematic offsets or 
67 biases in the properties of stars hosting known transiting planets relative to the properties 
es of the target sample. 

69 The construction of a target catalog itself can also produce selection effects in a transit 

70 survey. Most notable among constraints on target stars is an apparent magnitude limit 

71 because of a signal-to-noise ratio (SNR) requirement, or the need to confirm candidate 

72 transiting systems using Doppler observations. A magnitude limit will cause (Malmquist) 

73 bias towards more luminous stars; these can be included to larger distances and hence 

74 sample a larger volume of space. At a given effective temperature, luminosity is uniquely 

75 related to stellar radius, and hence this is also a bias towards larger stars that, unmitigated, 

76 will affect the detection of planets and estimates of their radii. 

77 Some of these effects would disappear or could be corrected if stellar parameters, 

78 i.e. radius, were precisely established. But, up to now, the large scale of transit surveys 

79 (10 4 — 10 7 stars) has precluded such determinations. Neither radius nor mass are directly 
so observable for distant, single stars such as Kepler targets. The properties of most Kepler 



8i stars have been inferred by c omparing ste 



Kepler Input Catalog (KIC) (IBrown et al- 



ar m odels to the broad-band photometry of the 



2011 



hereafter Brll). Few spectra and almost 



83 no parallax (distance) measurements are available, and most stars have only upper limits 

84 on proper motion. KIC estimates of stellar radii have large uncertainties due to (i) errors in 

85 photometry; (ii) degeneracies between stellar parameters and colors; and (iii) errors in the 

86 models themselves. While KIC estimates of stellar effective temperature are comparatively 

87 robust, those of surface gravity (log g) and metallicity (Fe/H) are not as reliable (Brll). 

as Brll concluded that no gravity or radius information could be inferred for stars hotter than 
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89 5400 K (g — r < 0.65). IVerner et al.l (120111 ) found that the KIC and astroseismic radii of 500 

90 solar-type stars ha ve random discrepan cies of order 50% and a systematic offset of about 



9i the same amount. 



Bruntt et al 



( 120 12l ) found a similar scatter b ut negligible 



Mann et al. 



s ystem atic 



tom found 



92 offset in log g (and hence the radius) of 93 solar-type Kepler stars. 

93 that many M-type stars that were classified as dwarfs or were unclassified in the KIC are 

94 actually evolved stars. 

95 Selection effects acting on uncertainties in stellar radius will bias the observed 

96 properties of planet-hosting stars with respect to their true distributi ons. For examp 



97 while essentially all M-type hosts of KOIs are bona fida dwarf stars ( IMuirhead et al. 



20121). the vast ma jority of the bright (K p < 14) targets and some fainter stars are giants 



99 ( iMann et al 



20121 ) . This disparity is a result of the strong selection effect on stellar radius 



described above; planets are far more difficu 



ioi large size and higher variability ( IHuber et al. 



t to d etect around giant stars due to their 



20101 ) . Because of the relation between planet 



radius and stellar radius (Equation [TJ, estimates of planet radius will likewise be affected. 

Here, we quantify five effects produced by selection bias and uncertainties in stellar 
parameters in the Kepler survey. In Section [2] we derive useful scaling relationships for 
selection effects on transiting planet detection and target star selection. In Section [3] we 
apply these concepts to the Kepler survey using the KOI catalog, parameters from the 
KIC, and models of stellar evolution and stellar populations. We describe our methods 
and models in Section 13.11 In Section I3.2[ we calculate the effect of Eddington bias on 
the radius distribution of KOIs as a result of uncertainties in stellar radius. In Section 13.31 
we describe the effect of Malmquist bias on the magnitude-limited Kepler target catalog 
and the preferential inclusion of more luminous, larger stars, thus biasing downwards 
the radius of some KOIs. In Section 13.41 we estimate the bias towards lower metallicity 
among KOI-hosting stars as a consequence of the relationship between stellar metallicity 
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H4 and radius on the main sequence. In Section 13.51 we describe how uncertainties in stellar 

us radius produce correlated errors in planet radius and stellar luminosity, potentially 

lie affecting statistics describing the relationship between "inflated" giant planets and stellar 

in irradiation. In Section 13.61 we consider the effect of stellar metallicity on transit depth and 

us the interpretation of any correlation between metallicity and the radii of giant planets. 

n9 We summarize our results and describe current and future efforts to better determine the 

120 parameters of Kepler stars in Section HI 



2. Analytical scaling relations 



In a transit survey, selection bias acts on a quantity X (a stellar or planetary 
parameter) when the probability / that a star is included in the survey, or that a planet is 
detected transiting a star, depend on that parameter. This bias is superposed on any real 
correlations and will persist to the extent that the values of the parameter and its effect on 
inclusion or detection are imperfectly quantified. The bias 5X is the difference between the 
observed mean (X /)/(/) and the intrinsic mean {X), or 

(Xf)-(X)(f) 



5X 



(f) 



(2) 



where the brackets represent marginalizing over the population of stars, subject to any 
constraints. To derive useful scaling relations, we chose apparent brightness (magnitude) 
and effective temperature T e as independently varying parameters. The first fixes the 
noise level against which a transit must be detected. Morever, the Kepler target catalog is 



132 magnitude-limited ( jBatalha et al.l 



20101 ). Among main s equence stars, T e is c 



133 to m ass, an important parameter of planet populations (Uohnson et al. 



2010 



osely related 



Howard et al 



20121 1 . Unlike other stellar 



(Brll, 



Pinsonneault et al. 



para meters, it can be robustly estimated from KIC photometry 



20121 )). Effective temperature is thus a convenient plotting 



136 parameter which minimizes complications due to variation in the planet population along 
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137 the main sequence. Nevertheless, values of T e do not map to unique values of stellar mass 

138 because stars have different ages and metallicities and plots with T e the dependence on 

139 mass should be considered "blurred" . Calculations using stellar models, as described below, 
wo explicitly take into account the effects of age and metallicity. 



2.1. Selection bias due to transit detection 



142 We first estimate the probability / that a star is included in a catalog of transiting 

143 systems. The probability of detecting a planet is calculated as a function of both stellar 

144 properties (radius and mass R* and M*) and planet properties (radius R p and orbital 

145 period P), and then marginalized over planet properties using an appropriate distribution 
we function. This yields / as a function of R* and M*. Equation [2] can then evaluated using 

147 a stellar model that describes the intrinsic distributions of these parameters and their 

148 relations to other observables. In many instances we can use scaling relations rather than 

149 exact relations becaue Equation [2] is normalized. 



We adopted a double 



radius and or 


Dital i 


Howard et al. 


2012) 



power-law for the intrinsic distribution f ' of planets with 



2008 



Howard et al. 



2010 



Mayor et al 



2011 



df ~ R^p-^dlnRpdlnP, 



(3) 



153 for P larger than some minimum value P m in where planets are found. Transit detection 

154 depends on the geometric probability that the planet is on a transiting orbit, as well as the 

155 the signal (depth) of the transit relative to noise. 

156 In the absence of coherent or "red" noise from the atmosphere, the signal-to-noise ratio 

157 of a single transit is y/~N (Rp/R*) 2 , where N is the total number of photons detected during 

158 the event. In an observation interval t about t/P transits will be observed, bringing the 
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159 total number of photons to ~ Nt/P. Therefore the signal-to-noise ratio of the co-added 

wo transits is 

lei At a given apparent brightness, N ~ r, where r is the transit duration. When the transit 

162 impact parameter is low and the transit chord is close to the stellar diameter, r w 2R*/V. 

163 Assuming a near-circular orbit, the transverse velocity V can be expressed in terms of the 

164 orbital period and mass of the star and 



P 



1/3 



165 where G is the gravitational constant. We derive a scaling relation between SNR and 
we planet/star properties by substituting r for N in Equation H] and ignoring constant factors: 

SNR ~ J R2p-l/3 i? -3/2 M -l/6_ (q) 

167 Solving for R p gives a scaling relation for the radius of the smallest planet on a given orbital 
lea period that can be detected at a fixed SNR threshold: 

Rmin ~ RTMI^PV*. (7) 

169 Likewise, there is a relation for the maximum orbital period at which a planet of a given 

170 radius R p can be detected at a fixed SNR threshold: 

P max ~ R®R~ 9/2 M~ 1/2 . (8) 
is a sensitive function of R v , unde rscoring why transit surveys are highly biased 



172 towards the largest planets (IGaudi 



20051 ). 



173 To obtain the observed distribution / of planets with R* and M*, we multiply the 

174 intrinsic distribution (Equation [3]) by the geometric probability that a planets is on a 

175 transiting orbit. For circular orbits this is proportional to the ratio of the stellar radius 
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176 to orbital semimajor axis R*/a which, based on Newtonian orbital dynamics, scales as 

177 i?*M* 1 / 3 p- 2 / 3 . The observed planet distribution is: 

df ~ R*M- 1/3 R- a p-^ +2/3) dlnRpdlnP, (9) 

178 We marginalize Equation [9] over both P and R p , first integrating from P m in to P m ax- The 
79 maximum period is also limited by the observing window and the requirement that more 
so than one transit must be observed, e.g. Pmax < t/3. 

si Integration of P~^ +2 ^ 3 ^d\np in Equation [9] yields a factor proportional to 

82 Pmin +2 ^ 3 ^ ~ Pmax^ 2 ■ If P m in *C t, then Equation El is used to re-express this as 

83 Pmin +2 ^ 3 ^ [l — (-Rp/-Rm)~ ( - 6/3+4 ' ) ] j where -R m is the radius of the smallest planet that can be 

84 detected at P = P m in, i.e. that can be detected at all). Integration of Equation |9] over Rp 
as from R m to oo produces: 

/POO 
df ~ itAC^iC^ 4 "^ / ^ (Q+1) (i - ^ (6/3+4) ) ^, (io) 

186 were x = R p /R m . Because the P m j n factor and the integral depend only on a and (3, which 

187 are planet population parameters and not stellar properties, they can be ignored when 

las calculating biases in stellar properties. Substituting for R m and retaining only factors that 
189 depend on stellar properties, 

/ ~ #l-3a/4 M -(l/3+a/12) (n) 

wo All else being equal, planets are more likely to be detected around stars with smaller radii 

191 (because transit depths are larger) and/or lower masses (because transit durations are 

192 longer). Smaller stars are thus more likely to appear in a transit-s elected sam ple. In the 

193 case of mass-radius relation R„ ~ M°' 8 for ze ro-age solar-type sta rs ( jCoxl 120001 ) and a planet 



20121 ). then / ~ M~ L31 . This 



194 radius distibution power-law index a = 2.6 ( iHoward et al. 

195 is simply a statement that smaller (and more) planets can be detected around lower mass 
we stars. Older stars will have a steeper mass-radius relation, and as a result the dependence 
197 of / on M* will be more pronounced. 
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198 At a given apparent brightness (observed flux), the quantity BR\jd 2 is fixed, where 

199 B is the stellar surface brightness in the bandpass of interest and d is the distance to the 

200 star. Substituting, R* ~ d/y/~B into Equation [TTJ and assuming that the planet population 

201 is distance-independent so that the distance factor can be moved outside the period and 

202 radius integrals, the scaling relation for observed occurrence becomes: 

/ „ d 1 " 30 / 4 £30/8-1/2 m -(1/3+q/12) _ ( 12 ) 



203 If a > 4/3, ([Howard et al 



2012 



e.g.), closer and hotter host stars are more likely to 

204 be included in transit-selected samples. Stellar age and metallicity, which affect the 

205 relationships between stellar mass, radius, and surface brightness, are also biased as a 

206 result. A correlation between stellar properties and distance can modulate the degree of this 

207 bias. For example, if more distant stars tend to be more evolved along the main sequence 

208 and thus hotter, the bias will be less than if d and B are independent. Equation [T2l does 

209 not consider that star of a certain mass or radius may be over-represented in the parent 

210 population: this is discussed in the next section. 



2n 2.2. Selection bias due to target selection 

212 The target catalogs of transit surveys such as Kepler are selected using a number of 

213 criteria, and chief among these is apparent magnitude. A magnitude-limited sample of stars 

214 will be biased tow ards the most luminous objects, which will be included to greater distances 



215 ( IMalmquist 



1922]). These stars may be either more massive, more evolved, or both. At 

216 a given T e and thus fixed surface brightness B (ignoring the weak dependence of surface 

217 brightness on gravity and metallicity), the signal N from a star during a transit will scale 

218 as (R*/d) 2 . Modifying Equation @] appropriately, we find that the transit signal-to-noise 

219 ratio scales as 

SNR^^P- 1 / 3 ^; 172 ^*" 176 ^ 1 - (13) 
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220 The smallest planet that can be detected at a given SNR will scale as 

Rsmall ~ PVWMWd 1 * (14) 

221 Multiplying a power-law distribution of planet radii (Equation [3]) by the probability that a 

222 planet is on a transiting orbit (~ M* R*) and integrating over all planet radii down to 

223 Rsmall 

gives the relation 

/ ~ p-«/6 jR l-«/4 M -(l/3+a/12) d -a/2_ (15) 

224 At a fixed color/temperature/surface brightness B, a magnitude-limited survey will include 

225 stars of radius .R* out to a distance d max ~ i?*. Assuming, for the moment, that transits can 

226 be detected to arbitrarily large distances, then integrating Equation [T5] over a homogeneous 

227 volume of radius d max yields 

/ rsj p-«/6 i? 4-3a/4 M -(l/3+a/12)_ ^ 

22B For a = 2.6 and at a given P, f scales as i?^ 05 M~ a55 . This relation illustrates how larger, 

229 more evolved stars can be preferentially included in a transit-selected sample despite the 

230 fact that transits of these stars are more difficult to detect. 

231 Although target stars in a magnitude-limited sample will be included to a distance 

232 d max ~ i?*, a planet of radius R p can only be detected to a distance ddet where, according 

233 to Equation [T5| 

d det ~ RlP-^R-^M;^. (17) 

234 The detection limit decreases with R* while the inclusion limit d max increases with R*. 

235 These limits coincide [d max = ddet) at a stellar radius R*: 

R. = R Rp 3 P~ 2/9 M~ 1/9 , (18) 

236 where Ro is a constant factor, P is in days, Rp is in Earth radii and M* is in solar masses. 

237 (We calculate values of of R for the Kepler survey in Section 13.31 ) Detections of planets 
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23B of a given size around stars with R* < i?* is magnitude-limited and subject to a stellar 

239 radius bias that scales as ~ Rf , because the sample volume increases as B% and the transit 

240 probability increases as R*. For stars with R* > .R*, a survey is limited to a volume 

241 propoortional to d? det ~ R* 3 ^ 2 (see Equation [TJJ, and the bias scales as ~ R* 1 ^ 2 , a weak 

242 dependence on R* in the opposite sense. The critical stellar radius R* is most sensitive to 

243 planet radius and the dependence on period and stellar mass is weak. 



3. Application to the Kepler transit survey 



3.1. Methods 



To evaluate biases and selection effects in the Kepler survey we modeled target s tars 



20081) 



247 with isochrones from the Dartmouth Stellar Evolution Database ( iDotter et al. 

248 interpolated onto a 0.1-dex grid of metallicities using the on-line tool. For each star, we 

249 compared adjusted KIC parameters (T = T e , Q = logg, T = [Fe/H]) to model predictions 

250 using Bayesian statistics. Specifically, we calculated a probability or weight w for each 

251 model according to: 



w 



p(M*)p(t*)p(T) P {0, 



(19) 



252 where parameters with a "hat" are the Dartmouth model values and p(t*), piJ 7 ), 

253 and p(() are the priors for initial stellar mass (initial mass function, IMF), age, metallicity, 

254 and a modified distance modulus £ = \i + 5 log 10 sin b, where b is the galactic latitude. 

255 The modified distance modulus accounts for the finite dispersion of stars above the plane 

256 of the Milky Way, but neglects the vertical displacement of the Sun. We used an SDSS 

257 r-band modulus /i = m r — M r , where m r is the observed apparent magnitude and M r is the 



25B absolute magnitude from t 



re Dartmouth mode ls. We ignored interstellar extinction, which 



259 will be < 0.5 magnitudes ( ISchlegel et al 



19981 ). (While estimation of stellar parameters is 
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260 sensitive to interstellar reddening, the amount of interstellar extinction is small compared to 

261 the uncertainties in the distance modulus.) Estimates of T e and [Fe/H] from the KIC were 



262 adjusted by -100 K and 0.17 dex, respectively and we used = 200 K, cr L 



OS 9 



0.36 dex, 



263 and cxpe = 0.3 dex, based on a comparison of KIC values with those spectroscopic values 

264 listed in B12. 



For priors we adopted the iKroupal (120021 ) IMF, and a uniform age distribution over 
1-13 Gyr. The la tter corresponds to a constant rate of sta r formation since the advent of 



the galactic disk ( IQswalt et al. 



1996 



Liu fc Chaboyer 



20001 ) . but ignores the youngest stars, 



around which planets are more difficult to detect. The metallicity distribution of Kepler 
target stars is unknown and may be complex; the field is not parall el to the Galactic p lane 



and may include members of a metal-poor "thick disk" population ( iRuchti et al 



20111 )- We 



27i used the metallicity distri bution predicted by the the TRILEGAL stellar population model 



( IVanhollebeke et al. 



20091 ) as a prior. Stars in the direction of the center of the Kepler field 



(£ = 76.32°, b = 13.5°) were simulated to a cutoff magnitude Kp = 16. When compared 
to 2MASS counts, TRILEGAL counts agree with observations at least down to b = 10°, 
but fail at b = 0, po ssibly due to incorrectly modeled bulge red giant branch stars and dust 



276 ( jGirardi et al. 



20051 ) . However, the Kepler field cuts off at b = 6° and and only 18 of the 84 

277 CCD centers lie at b < 10°. The (mostly default) values for TRILEGAL parameters are 

27B listed in Table [TJ 

279 TRILEGAL also reports a value of \x for each simulated star and we used these to 

280 construct a prior distribution of (. Our priors are relaxed in the sense they only exclude 

281 very unlikely masses, ages, or metallicities. It is also possible to impose priors on the stellar 

282 parameters T e and log g using the predictions of a stellar population model, but we consider 

283 such predictions too uncertain to justify this approach. 



For each star, Equation [19] returns an array of values for w corresponding to the grid 
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285 of Dartmouth models. Most values of w are negligibly small and the corresponding models 

286 were ignored. From the remainder, the most probable (highest w) model and accompanying 

287 parameters such as were identified. Statistics of the distribution of possible values were 

288 calculated, e.g: 

R. = S^Ml. (20 ) 

289 Because the distributions can be very non-gaussian, we defined the fractional uncertainty 

290 in a stellar parameter to be one-half of the range encompassing 68% of the total probability 

291 (normalized w) divided by the most probable value. We found that uncertainties in 

292 the radii of late G- and K-type dwarf stars hosting KOIs is typically ~15%, but are 

293 substantialy higher (~ 100%) among some F- and G-type stars because of the coincidence 

294 of the dwarf and (sub)giant branches (Figured]). Evolved stars (i.e. KIC logg < 4) also 



295 have comparatively larger uncertainties. The cluster of putative M "dwar: 



s" with radius 



20121 ). Our estimated 



296 uncertainties of ~25% might be misclassified giant stars (IMann et al.l 

297 uncertainties are certainly lower bounds because (1) the errors in the stellar parameters 

298 T e , [Fe/H], and especially log g are themselves not gaussian-distributed, as presumed in 

299 Equation [191 an d (2) we do not consider errors in the Dartmouth models themselves. 



3.2. Eddington Bias 



301 Eddington bias occurs when errors in measurement scatter more frequent values 

302 in a population to less frequent values at a higher rate than t he reverse proces s. This 



303 systematically inflates the observed frequency of rare members 



304 the distribution of planets with radiu s is a steep power 



Howard et al. 



2010 



Mayor et al. 



2011 



Howard et al. 



(Eddington 



aw (jCumming et al. 



19131). Be cause 



2008 



20121 ). errors in radius (fractional 



306 standard deviation ctr) will bias the number of larger planets upwards. This will inflate the 

307 rate of planet occurrence / above a given cutoff in radius Rc- Planets with radius Rp will 
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308 appear to be larger than the cutoff if the error in stellar radius is larger than Rc/Rp — 1. 

309 If errors in stellar radius are gaussian-distributed, the fraction of stars that satisfy that 

310 condition is erfc ((Rc/Rp — 1)/ (V2/Jp)) /2. The fractional upward bias in planet occurrence 
3n is the integral of this function with the normalized planet radius distribution, minus the 

312 intrinsic occurrence (normalized to unity): 

Af= ^r x ' {a+i)eiic (^7) dx - i; (2i) 

313 where x = R p /Rc- A/ increases with ap and, if a = 2.6, reaches 18% when ap = 30%. 

314 We estimated the amount of Eddington bias in the apparent radius distribution of KOIs 

315 using the procedures described in Section 13.11 For each KOI we calculated the likelihood 

316 weight w (Equation [19]) for all possible stellar models consistent using the parameters of the 

317 host star. Corresponding to each model we calculated a revised planet radius R p x (R'^/R*), 
31B where R p is the radius from B12, R'^ is the model stellar radius and R* is the stellar radius 

319 of the maximum likelihood model (highest w). The radius distribution, weighted by w, 

320 is summed over all KOI host stars and normalized. This is compared to the observed 

321 distribution of R p (Figure [2]). The latter is not the intrinsic distr ibution, which must 



2012|) 



322 account for the probability that a planet transits and is detected ( IHoward et al. 

323 As expected, Eddington bias increases the apparent number of Neptune-size and larger 

324 planets. The bias is 17% above 3.4R e , demarcated by the vertical dashed line in Figure 

325 [2] where the normalized distributions are equal. The bias also suppresses the peak in the 



326 distribution at a Jupiter radius. Corollaries of these results are t hat the actua 



327 rate of Neptune-size planets is smaller than previously reported (IHoward et al. 



occurrence 
i.e.), 



2012 



328 and that the intrinsic peak at R p ~ lRj is more pronounced than is apparent. 

329 In addition, Eddington bias decreases the apparent slope in the radius distribution 

330 (Figure [2]). This is a consequence of the observed turnover in the number of planets smaller 

331 than ~ 2i? e , and whether more large planets are scattered to smaller radii than vice 
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332 versa. Kepler observations are incomplete for Rp < 2i? e and while the intrinsic radius 

333 distribution of planets is presumed to turn over, the radius at which this actually occurs is 

334 not known and awaits a better understanding of the efficiency of Kepler detection of small 

335 signals. If the turnover below 2i? e is real, then the intrinsic slope of the radius distribution 

336 is steeper than observed (a = 2.6). But if a scale-free power-law distribution continues to 

337 much smaller radii, then Eddington bias affects the magnitude, but not the slope of the 

338 distribution. 

339 We simulated Eddington bias on artificial samples of planets with radii drawn from a 

340 power-law distribution with variable index a. These radii replaced actual KOI estimates 

341 in a repeat of the analysis described above. The power-law index of the binned apparent 

342 distribution p(Ri) above some minimum radius R m in is calculated by maximum likelihood: 

343 a = p{Ri) /^2iP(Ri) log {RiJ Rmin), where the summation is over all Ri > R m in- As 

344 expected, while Eddington bias significantly increases the fraction of planets with R > R m i n , 

345 the power- law index is relatively unchanged (Figure |3]). 

346 3.3. Malmquist Bias 

347 Malmquist bias is the preferential inclusion of intrinsically luminous objects in a 

348 magnitude-limited survey due to the rapid increase in sampling volume <i^ ax with distance 

34 9 d-max to which an object is included. Among large, readily-detected objects (planets) in a 

350 magnitude-limited transit survey, the bias is even greater (~ d max ) because the probability 

351 of a transiting geometry is proprtional to R* which, at a given effective temperature, scales 

352 with d m ax (see Section l2~2l) . At a given apparent magnitude and planet radius, there is a 

353 maximum stellar radius R* to which a survey is essentially complete, i.e. not limited by the 

354 SNR of a transit event. 
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We estimated i?* as a function of R p by establishing the detection limit at different 



356 Kepler magnitudes. The Kepler target cata 



357 for stars with K p < 14 and 14 < K p < 16 (IBatalha et al 



og was constructed with different criteria 



LO) 



20101 ); it is probably nearly 



358 complete for dwarf s tars to K p = 14 but only includes selected dwarfs with 14 < K p < 16 



359 (IBatalha et al 



20101 ). We adopted a SNR limit of 7.1 and an observation period of 487 d 

360 (B12). To estimate the noise of a typical dwarf star we performed a polynomial fit to a 

361 running median (N = 1000) of 3 hr combined differential photometric precision (CDPP) 

362 values for Kepler targets with log g > 4, presumed mostly dwarfs. This gave an estimate of 

363 the intrinsic 3 hr RMS noise level as a function of K p ; 

log <r 3 (dwarfs) « -4.27 + 0.116(iT p - 12) + 0.0247(7^ - 12) 2 . (22) 

364 The median noise at K p = 12 is 54 ppm. We performed a similar analysis on stars with 

365 KIC log g < 4, presumably subgiants and giants, that constitute a locus of comparatively 

366 "noisy" targets, and found: 

log a 3 (giants) « -3.69 + 0.045(if p - 12) + 0.115(1^ - 12) 2 . (23) 

367 For K p = 14 dwarfs, R = l.72R & and at K p = 16, R = 0.77R Q . At K p = 14, for a 

368 median orbital period P ~ 16 d and Rp = 2R 9 , Malmquist bias favors stars as large as 

369 2.3R & . At Kp = 16, only stars with R* < 1.0R Q are favored because of higher noise at 

370 fainter magnitudes. The situation is more extreme for giant planets (Rp ~ 10i2®), where 

371 Malmquist bias will favor evolved stars as large as 1O-2O_R , presuming giant planets exist 

372 around such stars, as we discuss below. 

373 Bias towards larger stars, coupled with uncertainties in stellar radius, leads to 

374 underestimates of stellar - and hence planetary - radii. We quantified this effect using the 



375 machinery described in Section 13. 1[ with the addition of a Malmquist bias factor. For each 

376 KOI-hosting star, we evaluated the mean stellar radius by averaging over all stellar models 
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377 weighted by w (from Equation [19]) and multipled by either (i^/i?*) 4 , where i?* < R*(P, K p ), 

37 8 or (RJR*)- 1 / 2 , if < R*(P, K p ). 

379 The ratio of the "naive" mean model radius to the bias- weighted mean radius is plotted 

380 in Figure H] vs. the nominal planet radius published in B12. Deviation of this factor from 

381 unity can be considered the error in radius that results if Malmquist bias is not taken into 

382 account. About two-thirds of all KOI-hosting stars, and the vast majority of those hosting 

383 planets smaller than Neptune have predicted Malmquist bias values <10%. However, the 

384 majority of larger planets may have significantly underestimated radii, some by a factor 

385 of two. This dichotomy occurs because Kepler detection of large planets is limited by the 

386 magnitude limit of the target catalog, not the SNR of transit. We emphasize that these 

387 calculations are statistical, i.e. we are calculating the expectation values of probability 

388 distributions with stellar radius, and that actual errors will vary. Nevertheless, the host 

389 stars of many giant planets may be more larger, more distant, and more luminous, and 

390 the radii of their planets may be significantly underestimated. Inclusion of larger, evolved 



39i stars means that some K OIs may be astrophysical 



392 masquerading as planets (ICharbonneau et al. 



2004 



alse positives, e.g. M d warf companions 



Almenara et al. 



20091 ). a possibility 



that we discuss in Section HI 



3.4. Metallicity Bias 



The metallicity of host stars is an important parameter in studies of planet statistics. 



396 A correlation between stell ar metallicity and the presence of giant planets has 



397 unambiguously estab lished ( jGonzalez 



1998 



Santos et al 



Buchhave et al. 



2004! : 



Deen 



Fischer fc Valenti 



2005 



20121 ) and is con sistent with a prediction by the core-triggered instability 



399 theory of giant planet formation (IMizuno 



19801 ). i.e. that a solid core that initiates runaway 



accretion before the gas dissipates is more likely to form in a disk with a higher abundance 
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of solids. Doppler surveys have failed to find any correlation be t ween metallicity and the 



402 occurrence of Neptune-s i ze or smaller planets (jSousa et al 



2008 



Mayor et al. 



2009 



2011) 



Schlaufman &: Laughlinl ( 120111 ) found that the average g-r color of most Kepler stars with 
small candidate planets was no different from the average of all stars at a given J-H color, 
except for late K and ea rly M-type stars; those with planets have redder g-r colors and 



Schlaufman fc Laughlinl argued that these are more metal-rich. However, this difference 
may be an artifact of contamination of the sample by evolved stars, which have bluer g-r 
colors than dwarfs and make the overall sample, but not the KOI-hosting sample, bluer 
20121 ) . Indeed, g-r col or might be insensi ti ve to or depend o nly w eakly on 



409 ( Mann et al. 



metallicity for these spectral types ( jLepine et al. 



2012|). 



Muirhead et al. 



(120121 ) report 



metallicities of 78 late K and M dwarfs with KOIs based on infrared spectra. The mean 
yalue, -0.09, is consistent with the metallicity of M dwarfs in the solar neighborhood 



( ISchlaufman fc Laughlin 



2010 



Woolf k West 



20121 ). The average metallicity of Kepler M 



dwar 



s is not known but these intrinsically faint stars are within a few hundred pc of the 



Sun (IGaidos et al. 



2012 1. 



e The metallicities of stars of transiting planets need not be representative of the 

7 underlying population of planet-hosting stars. Metals are an important source of opacity in 

b the atmospheres of cool stars, and, all else being equal, metal-poor dwarf stars should have 

9 smaller radii. A transiting planet will be more detectable around a metal-poor subdwarf 

420 than a metal-rich dwarf star, and thus the host stars of KOIs will be biased towards 

421 metal-poor representatives of the overall population. If sufficiently large, this bias could 

422 obfuscate any intrinsic relationship between stellar metallicity and the presence of planets. 

423 We calculated the metallicity bias, i.e the expected metallicity of stars with detected 

424 transiting planets minus the expected metallicity, for all Kepler Quarter 6 target stars 

425 using Eqns. |2] and [TTJ and the methods described in Section 13.11 The difference between 
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426 the "naive" mean metallicity of Dartmouth models for each star, and the biased mean 

427 using the factor of Equation [TTJ is plotted vs. adjusted KIC effective temperature in 

42B Figure As expected, the metallicity bias is negative except for a locus of positive values 

429 corresponding t o evo lved stars, for which radius decreases with increasing metallicity, e.g. 



Zielinski et al. 



( 120121 ). The bias is small (mean of -0.017 among dwarfs) for the following 

431 reasons: (i) the geometric transit probability is proportional to stellar radius and thus 

432 increases with metallicity, countering the effect of metallicity on transit depth; and (ii) 

433 the effect of metallicity on stellar radius is most pronounced among comparatively rare 



434 sub dwarfs but has on 



435 stars ( jBoyajian et al. 



y a m odest effect around solar metallicity, especially for the coolest 



2012|). 



3.5. Covariant errors and "inflated" Jupiters 

At the t i me th e first exoplanet around a main sequence star was confirmed, 



Guillot et al. 



( 119961 ) realized that highly-irradiated giant planets on close-in orbits may 
have anomalously large radii. After sufficient numbers of transiting giant planets were 
discovered, it became apparent that some were "in flated" compared to theoretical 



predictions ( jBurrows et al. 



2000 



Baraffe et al. 



20031 ) . Planets larger than Rj « 1.2 cannot 
be explained by conventional inter ior models of gas gia nts and require an additional source 



443 of internal energy to inflate them (IFortney et all 120101 ) . Several non-exclusive explanations 



for t 


1C 


2002 


: I 



r e requisite energy source ha ve been put forward (IBodenheimer et al. 



2001 



Showman 



Batygin fa Stevenson 



20101 ). One important clue is that planets experiencing higher 
irradiance or having higher emitting temperature are more likely to be inflated. Correlations 
between equilibrium temperature and radius have been reported a mong transiting gi ant 



planets discovered in ground-based surveys ( jLaughlin et al. 



2011 



Enoch et al. 



20121 ) 



449 Among Kepler giant planet candidates, inflation appears to occur only above an irradiance 
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of about 2 x 10 8 ergs s 1 cm 2 (IDemory fc Seagerll201ll . hereafter Dll 



451 Where information about stellar parameters is limited, spurious correlations can appear 

452 if two supposedly independent planetary parameters are related to the same, uncertain 

453 stellar parameter. In the absence of parallax or precise information on surface gravity, the 

454 radius of a star is constrained only by models of stellar atmospheres, stellar evolution, and 

455 galactic population. Uncertainty in stellar radius translates into corresponding uncertainties 

456 in both stellar luminosity and transiting planet radius. Because the radiation that a planet 

457 receives from a star is proportional to stellar luminosity, errors in irradiance and planet 
45B radius due to errors in stellar radius will be positively covariant. At least in principle, an 

459 apparent, positive trend between irradiation and planet radius could be created merely by 

460 errors in stellar radius. 

461 We simulated the impact of this systematic with an analysis of KOIs similar to, but not 

462 identical to that of Dll. We selected all KOIs with estimated radii of 8i?® < R p < 22i? e 

463 from B12, excluding those listed as false positives or "ambiguous" in Table 1 of Dll. As 

464 in Section 13.1} we identified the best-fit Dartmouth model for each host star based on 

465 a x 2 minimization of the difference with adjusted KIC values of T e , logg, and [Fe/H], 

466 after applying corrections of -100 K to T e and 0.17 dex to [Fe/H] (Brll). We assumed 

467 standard deviations of 200 K, 0.36 dex, and 0.3 dex, respectively based on Brll and 190 
46B stars where both KIC and spectroscopy-based parameters are available (B12). If no KIC 

469 value for [Fe/H] was available we assumed solar metallicity. To estimate the maximum 

470 possible effect, no constraints other than the Dartmouth evolutionary tracks were used, 

471 i.e. we equally weighted masses, ages between 1-13 Gyr, and metallicities between -2.5 and 

472 +0.5 dex. Orbit-averaged stellar irradition of the planet was calculated based on the model 

473 luminosity and mass, the orbital period, and assuming a circular orbit. (Non-circular orbits 

474 change the mean irradiance only slightly.) Planet radius was calculated from the transit 
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475 depth and stellar model radius and we did not account for limb darkening. The encircled 

476 points in Figure M indicate the best-fit planet radius vs. irradiance. Three KOIs (217.01, 

477 774.01, and 1547.01) have estimated irradiances < 2 x 10 8 ergs s _1 cm -2 and R p > 1.2Rj, 

478 but only marginally so. 

479 Fifteen KOIs have re-estimated radii < 0.5Rj even though the values listed in B12 

480 exceed the criterion > 0.714Rj. Twelve of these have KIC impact parameters b > 1, 

481 suggesting problematic (or extreme grazing) transit solutions. Another (KOI 1419.01) has 

482 an implausible b = 0.994 which is inconsistent with its transit duration of t = 1.36 h and 

483 period P = 1.36 d. KOI 377.02 (Kepler 9-b) has an erroneous transit depth reported in 

484 the MAST. The best-fit Dartmouth model assigns a somewhat smaller radius (0.48_R Q ) to 

485 the host star of KOI 1193.01 and thus makes the planet smaller as well. We excluded all 

486 planets with newly estimated radii R p < 8i?® from our analysis. 

487 We assessed the trends produced by correlated errors in planet radius and irradiation 

488 by considering all Dartmouth models that satisfy x 2 < Xmin + 8-02, where xLin * s the 

489 minimum (best-fit) value, and 8.02 is the A% 2 corresponding to a 95.4% (2a) confidence 

490 interval for v = 3 degrees of freedom (stellar parameters). Because there are too many 

491 models to plot, we only show a random subsample of 200 such models for each KOI as 

492 the small points in Figure [6j These clearly show that correlated errors will tend to scatter 

493 points between the high irradiation/inflated and the low irradiation/uninflated regions of 

494 the diagram. 

495 The paucity of KOIs with inflated radii (Rp > 1.2Rj) in the low irradiance region 

496 (upper right hand domain of Figure [6]) supports the contention that the inflation of giant 

497 planets is related to stellar irradiation or planet equilibrium temperature. Furthermore, 

498 Kendall's and Spearman's rank correlation tests of all KOIs with R p > 0.714i?j yield 

499 t values of 0.246 and 0.364, respectively, and corresponding p (significance) values of 
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500 4.6 x 10~ 5 and 3.1 x 10~ 5 . These low false-positive probabilities indicate a significant 

501 correlation between irradiation and plane radius. However, these statistics do not account 

502 for the systematic effect of correlated errors in radius and irradiation. 

503 We simulated the effect of correlated errors by analyzing 10000 null realizations of 

504 the data where radii and orbital periods of KOIs were randomly shuffled among host stars 

505 and the transit depths were recomputed using Equation (TJ thus destroying any intrinsic 

506 correlation between radius and irradiation. In computing each realization we include all 

507 KOIs with R p > 3-R© to account for small planets that may appear larger, but in each 
sob Monte Carlo realization, as with the real sample, we limited the statistical analysis to 

509 8-22i? e . New ("observed") estimates of KIC stellar parameters were constructed from the 

510 "true" values by adding random, gaussian-distributed offsets with standard deviations of 

511 200 K for T e , 0.36 dex for log g, and 0.3 dex for [Fe/H]. Best-fit Dartmouth models were 

512 found for each parameter set, the planet radii and irradiation values were determined, and 

513 the correlation statistics were calculated. New p values for the fraction of KOIs in the 

514 low-irradiance/inflated-radius zone, and Kendall's r, and Spearman's r were computed as 

515 the fraction of MC realizations that are smaller (more significant) than the observed values. 

516 The distributions for the first two metrics are shown in Figs. [7] and [8] and the p values are 

517 1.4 x 10~ 3 and 6 x 10~ 4 , respectively. The result for the Spearman's rank coefficient is 
sis similar, with p = 6 x 10~ 4 . 



3.6. Stellar metallicity and "shrunken" Jupiters 



Dodson- Robinson! ( 12012 



hereafter DR12) reported a weakly significant (p = 0.02 



52i or 2.3a) trend of decreasing radius of Kepler (candidate) giant planet with increasing 



522 met a 



licity of the host star. She examined the ratio R p /R* of 218 KOIs from 



Borucki et al. 



523 ( 1201 lh with estimated radii of 5-20 R§ and the correlation with estimated values of [Fe/H] 
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from the KIC. She interpreted the decline as evidence that giant planets around metal-r ich 



525 stars have larger solid cores and, for the same total planet mass, smaller radii (IGuillot 



2005|). 



Figure [9] is an updat ed version of 



52B KOIs with revised radii (patalha 



figure 1 in DR12 based on the more recent release of 
2Q12Q . It includes 225 KOIs with 5i2© < R p < 20i? e and 

529 host stars with KIC-determined metallicities. As in Figure 1 from DR12, a running median 

530 (N = 21) is plotted. The Kendall r correlation coefficient is -0.032, indicating no signficant 

531 correlation (p = 0.48). We were unable to reproduce the result of DR12 by simple cuts on 

532 this sample to approximate the earlier KOI sample, p erhaps because many stellar radii (and 



533 hence planet radii) have been revised ( iBatalha 



20121 ). We also emphasize that the values of 



[Fe/H] in the KIC are no more accurate than ±0.3 dex (Brll). 

Irrespective of any physical phenomenon, one would expect to observe a decrease in 
Rp/R* with increasing metallicity simply because metal-rich dwarfs tend to be larger than 
metal-poor dwarfs, and hence transit depths will be smaller (Equation [T]). We modeled this 
effect with 10000 Monte Carlo realizations of the KOI catalog. There are two effects from 
increasing the radii of the host stars of a given planet population: one is that transit depths 
will become smaller and the planets will appear to be smaller. The other is that some 
planets may fall below the lower radius cutoff (5R®) and be excluded from the analysis. 
The reverse is true for lower metallicity; planets appear larger and a few planets may exceed 
the maximum cutoff (20R & ). We therefore considered KOIs over a broader (3-25i? ffi ) range 
of radii, adopted this sample as representing the intrinsic ("true") distribution of radii, 
estimated their apparent radii from the radius of the star and transit depth, and then 
applied the same radius criteria as DR12. We randomly shuffled the planet population 
among the host stars, thus destroying any intrinsic radius-metallicity correlation, computed 
the radii of the stars using the Dartmouth stellar evolution models, and re-calculated the 
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549 transit depths. 

550 Each Monte Carlo host star was assigned the corrected T e of the actual star it replaced. 

551 We assigned an observed metallicity based on the KIC value, a systematic correction 

552 A of 0.17 dex (Brll), a random normally-distributed error a of 0.3 dex, and a prior 

553 distribution of intrinsic metallicities that is a guassian with mean F and standard deviation 

554 e. This is equivalent to drawing metallicities from a single normal distribution with mean 

555 (e 2 (F + A) + a 2 F)/(e 2 + a 2 ) and standard deviation e 2 a 2 /(e 2 + a 2 ). The radius of each 

556 Monte Carlo star was taken to be the median of all model radii with log g > 4 (presuming 

557 they are dwarf stars), [Fe/H] within 0.15 dex of the Monte Carlo model, and T e within 

558 100 K. We did not apply any age criterion other than 1-13 Gyr. We then calculated R p /R* 

559 using the shuffled planet radius and the median model radius stellar radius. For each Monte 

560 Carlo sample, we calculated Kendall's r and false positive probability for a correlation 

561 between the observed metallicities and the artificial transit depths. 

562 Median-filtered (n = 21) curves from these Monte Carlo realizations typically show a 

563 decline of R p /R* with increasing metallicity. Figure [TU] shows the distribution of r from 

564 10000 null realizations. The value of r from the actual KOI sample is plotted as the dashed 

565 line. 61.6% of these null realizations produce a significant (p < 0.01) correlation and 

566 71.6% of values are below (and thus more significant than) the actual value of -0.032. For 

567 comparison the DR12 value is -0.17. Thus, negative correlations between metallicity and 
ses Rp/R* are to be expected soley as a consequence of the metallicity-radius relation of stars, 

569 although these Monte Carlo simulations indicate that there is a ~40% chance that random 

570 errors in KIC [Fe/H] values would prevent such a correlation from being detected. 
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571 4. Discussion 

572 We have shown that selection effects for both transiting planets, and the target stars of 

573 transit surveys, combined with uncertainties in stellar radii, can bias the properties of host 

574 stars and their planets. Th ese ef fects are in a dditio n to those previously identified by 



fl2005h . iGaudi et all (|2005|), and 



Gaudi 



Pont et al. 



(120061 ). which concern effects arising from the 



576 sensitivity of detection efficiency to planet radius and period. We have analyzed the effects 

577 of these systematics on the Kepler survey and its catalogs of target stars and candidate 

578 planets, using current models of stellar evolution and galactic stellar populations to infer 

579 the properties of Kepler stars. We did not apply constraints from the relation between 

580 stellar density, transit duration, and orbital period because the relation also depends on 

581 unknown orbital eccentricity and argument of periastron, and is not applicable to non-KOI 

582 stars. 

583 We found that Eddington bias from the steep distribution of KOIs with radius results 

584 in an overestimation of the overall frequency of planets with R p > 2i? e by about 15-20% 

585 of the actual value. We also find that Eddington bias acts to soften the "bump" in the 
sse distribution at Jupiter-size planets. This leads us to predict that the intrinsic peak at 
587 that radius is more pronounced. The effect on the distribution of smaller planets depends 
ssa on whether the turnover in the radius distribution below 2i? e is real, or the result of 

589 incompleteness. If the former, Eddington bias acts to flatten the apparent slope of the 

590 radius distribution, and in this case we predict that the actual slope is steeper than the 

591 a = 2.6 power-law. Otherwise, the effect of Eddington bias on the power-law index is about 

592 0.1. 

593 We made statistical estimates of Malmquist bias as a consequence of the magnitude 

594 limit of the target catalog. The estimated bias for two-thirds of KOI systems, including most 

595 KOIs smaller than Neptune, is < 10%. However, we found that bias is more prevalent and 
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596 pronounced (up to a factor of two in radius) among larger candidate planets and their host 

597 stars, resulting from detection of these systems being governed by the apparent magnitude 

598 limit of the target catalog, rather than the SNR of transit detection. A Malmquist bias 

599 towards more luminous stars raises the possibility of inclusion of unidentified evolved stars 
eoo within the Kepler target catalog (in addition to a number of deliberately selected and 

6oi clearly identified giant stars). Nominally, stars with large radii wer e removed by a vett ing 



602 process that used a criterion of Kepler detection of a 2i? ffi planet (IBatalha et al. 



2010) 



603 However, KIC-derived stellar radii are based on estimates of log g and many of these 

604 are problematic. KIC photometry provides no information for the gravity of stars with 

605 T e > 5400K (g — r < 0.65), and subgiants would be assigned erroneously high log g (Brll). 

There are bon a fida subgiants hosting KOIs, e.g. the F5 subgiant HD 179070 



607 (IHowell et al. 



20121 ). Spectroscopy of stars hosting candidate giant planets has revealed 



other instances in which su bgiants were misclassified as cooler, main sequence dwarfs in the 



KIC. 



Santerne et al 



(120 111 ) report a hot- Jupiter- hosting F-type subgiant (M* ~ 1.48M Q , 
i2* ~ 2.13i? Q ). Based on spectra, they estimate \ogg = 4.1 ± 0.2, which is in contrast 
to its KIC value of 4.55. Likewise, the hos t of KOI-423b, assig ned log g = 4.45 in the 



612 KIC, is an F7IV subgiant wit h log q = 4.1 (IBouchv et al. 



6i3 eclipsing binaries identified by 



Santerne et al 



20 lit ). Three of five undiluted 



(120121 ) as false positives among Kepler giant 



planet candidates have masses larger than 1 M Q , and one of these is definitely an evolved 
star. The mean difference between 190 pairs of KIC and spectroscopic values of logg 



6i6 reported in 



Batalha 



(120121 ) is only 0.02 dex (standard deviation of 0.36 dex). Nevertheless, 



en astroseismically-derived log g values average 0.05-0.17 d ex lower than KIC values and 



6i8 astroseismically-determined radii are up to 50% larger (jVerner et al. 



2011 



Bruntt et al. 



20121 ). 



Among KOI-hosting stars whose radius has been underestimated, small planets may 
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621 actually be larger, even Jupiter-size planets. In turn, giant "planets" may turn out to be 

622 diluted or undilut ed stellar companions, a s i gnificant source of ast rophysical false positives 



623 in transit surve ys flC harbonnca u et a 



624 Doppler survey, 



Santerne et al. 



2004 



Almenara et al. 



20091 ). Based on a preliminary 



(120121 ) estimated that about 40% of candidate giant planets 



625 are false positives and about one quarter of those are undiluted eclipsing binaries. This 



626 also means that estimates of the occurrence of 



20121 ) must be revised downwards. 



Wright et al 



upiter s on close-in orbits ( Howard et al. 



(120121 ) report that the occurrence of "hot 



628 J up iters" (P < 10 d) in the Kepler catalog is only half that seen in Doppler surveys, and 

629 adjustment for a high false-positive rate would worsen this discrepancy. 

630 One explanation for the discrepancy between the Kepler and Doppler surveys might 

631 be the presence of misidentified subgiant stars in the Kepler target catalog. The intrinsic 

632 distribution of planets may be different around evolved sta rs compared to ma in sequence 



stars. Planets have been discovered around subgiant stars ( IButler et al. 



634 planets appear to be rare with 0.6 AU (P < 120 d) of clump GK giants (ISato et al 



2010 



Johnson et al. 



2006) , but 



201 lh - CoRoT-21b may be an exception (IPatzold et al. 



giant 



2008 



201 lh . The 



timescale of the decay of a planet's orbit due to dissipation of tides in a star's convective 
envelope scales as R~ 8 M env , where M env is the mass of the envelope. Hot Jupiters are likely 
to be destroyed by infall and disruption inside the Roche lobe as a star evolves o ff the main 
sequence, expands, and its convective envelopes thicken ( IKunitomo et al.ll201ll ). Thus, 
one explanation for the comparative paucity of hot Jupiters in the KOI catalog is that, 
because of Malmquist bias, many Kepler targets are older stars or subgiants for which hot 
Jupiters cannot be detected, have been miscategorized as Neptunes, or have been destroyed 
by orbital decay. A comparison between the distributions of log g predicted by TRILEGAL 
and that of the KIC suggest no large (>10%) population of unidentified subgiants, however 
spectroscopy of candidate subgiants is needed to actually test this conjecture. 
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We have shown that, because metal-poor stars tend to have smaller radii than their 
metal-rich counterparts, stars with transiting planets will be biased towards metal-poor 
members, independent of any correlation between planets and metallicity. However, we 
estimate that this metallicity bias is only about -0.02 dex and can be neglected. Thus a 
comparison between the mean metallicity of stars with transiting planets and that of the 



65i overal 



target population is appropriate. The mean metallicity of M dwarf s with KOIs, 



652 -0.09 (IMuirhead et al 



20121 ). and solar- type stars with sm all planets, -0.01 ( 



20121 ). appears similar to that of the solar neighborhood: 



Buchhave et al. 



Schlaufman &: Laughlinl (120101 ) 



report a mean metallicity of —0.14 : 
using a photometric calibration, and 



0.06 for a volume-limit ed local sample of M dwarfs 



Casagrande et al. 



(120111 ) report a median metallicity of 



-0.06 for all stars in the solar neighborhood. Whether the overall Kepler target population 
has a similar metallicity distribution is not yet known and additional observations are 
required. From our calculations we conclude that such a comparison would not suffer from 
significant metallicity bias, but must take into account a dilution factor because stars 
without transiting planets are not necessarily stars w ithout planets. This dilution factor is 



eel large for a high planet occurrence (IMann et al 



2012 ). 



662 We have shown how uncertainties in stellar radius or distance produce correlated errors 

663 in a planet's radius and the radiation received from the host star. This effect can produce 

664 an artificial correlation in populations of planets where none exists. Recently, such a 

665 correlation has been found in both ground-based transit surveys and the Kepler catalog, and 
eee highlighted as a test of mechanisms to explain the "inflation" of giant planets on close-in 

667 orbits. We quantified the systematic effect of correlated errors in stellar radius in the case 

668 of the Kepler KOIs and show that, despite this systematic, the result of Dll, i.e. that 

669 inflated planets are absent at low irradiance, is still significant. To maximize any systematic 

670 effect, we used a very broad range of metallicities (-2.5 to +0.5) and no constraint on 

671 stellar distance (e.g., from a model of galactic structure), thus further strengthening our 
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672 conclusion. 



673 Finally, we have shown how searches for trends of transiting planet radius with stellar 

674 properties may engender systematic errors unless the effect of those properties on apparent 

675 stellar radius - and hence planet radius - is taken into account. We examined the tentative 

676 (2.3c) claim of DR12 that giant planets around metal-rich stars tend to have smaller 

677 transit depths, because they are smaller and perhaps have larger rocky cores. Performing a 

678 similar analysis on the most recent KOI catalog, we were unable to reproduce that trend. 

679 Moreover, we performed simulations that show that the trend observed by DR12 could be 

680 easily explained by the dependence of stellar radius on metallicity. 

681 Two limitations of our analysis are that (i) we have asssumed gaussian-distributed 

682 errors in the corrected KIC parameters T e , logg, and [Fe/H], and (ii) that the construction of 

683 Bayesian priors on mass, age, and metallicity treat them as independent variables. Neither 

684 of these is absolutely correct; the first assumption probably produces an underestimate of 
ess the uncertainty in stellar radius while the second assumption produces an overestimate of 
ese the uncertainty. Of course, any inadequacies in the Dartmouth stellar evolution models 
687 themselves are not accounted for. 

ess There are other systematics effects which may be presen t in transit surveys. T wo-thirds 



689 of solar-type (F6-K3) stars are found in multiple systems ( IRaghavan et al. 



20101). At 



ego the typical distance of Kepler KOIs with solar- type hosts (950 pc), one 4 arc-second 
69i pix subtends about 3800 AU, sufficient to include nearly all companions to primaries 



692 ( ILepine et al 



2007 



Raghavan et al 



20101 ). The presence of an unresolved companion, 

693 or any background star, will dilute the transit signal. Transits otherwise just above the 

694 detection threshold might be rendered invisible. As a consequence, members of multiple 

695 systems may be underrepresented among stars with transiting planets. For equal-mass 

696 binaries (twins) where the transit signal is lower by a factor of 2, the fractional noise will 
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697 decrease by v2 (due to the doubling of the signal compared to a comparable single star) 

698 and thus the radius of the smallest detectable planet will increase by a factor of 2 1//4 , or 

699 about 1.2. For a power-law size distribution (Equation IB}, the number of detectable planets 

700 per star will decrease by a factor of 2 _Q//4 , or 0.64 for a = 2.6. Ho wever, nearly-equal mass 



201CH ) and systems with mass 



701 binaries represent only 12% of all binaries ( iRaghavan et al.l 

702 ratios < 1 and luminosity ratios <C 1, where the dilution will be much smaller, are the 

703 norm. Star counts reach ~1000 mag -1 deg~ 2 at K p = 16, and so there is only a few % 

704 chance of significant dilution by an unrelated star. To the extent that stellar variability 

705 inhibits transit detection, younger, and more active stars will be also underrepresented 

706 among KOIs. 

707 The best defense against the systematic errors we have described is better 

7ob characterization of the target stars of transit surveys, especially those hosting planets. 

709 This will reduce, but not entirely eliminate, these biases. Spectroscopic characterization 

710 and refinement of the properties of a fully representative sample of Kepler target stars, 

711 not just the KOI hosts, is vital to robust statistical analyses of the pro perties of transitin g 



712 planets and their par ent stars stars, and such programs are unde rway dMann et al. 



Buchhave et al. 



SNR (~ 10) flKatz et al. 



20121). Spectr a of modest resolution (R < 1000) ( iMalyuto et al. 



2012 



20011 ) or 



19981 ) (but not both) can provide substantial improvements over 
photometry alone. The Gaia (originally Global Astrometric Interferometer for Astrophysics) 
mission, scheduled for launch in August 2013, will obtain parallaxe s of stars as faint as 



16th magnitude with a standard error of <40 //as (Ide Bruijndl2012l ). This will allow the 
luminosity of a solar-type star to be determined with an error about 15% and its radius with 
an error of about 8%. The distance to brighter stars will be measured with even greater 
precision. Gaia will also obtain moderate-resolution spectra i n a narrow region cente red 



on the Ca II triplet region which can be used to classify stars (jKordopatis et al. 



20111 ) and 



measure their radial velocities to a precision of a few km sec 1 . Radial velocites, combined 
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723 with parallaxes, yield space motions and membership in distinct stellar populations (e.g. 

724 thin disk, halo). Gai a data will also bene fit future transit surveys that will cover all of or a 



725 large part of the sky (jDeming et al. 



20091 ). 
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Fig. 1. — Uncertainty in radius of stars hosting KOIs, defined as one half of the range 
encompassing 68% of the probability distribution of radii. A few stars with the largest 
uncertainy are off-scale. As defined, the uncertainty can exceed the mean or most likely 
value and does in some cases. The adjusted KIC effective temperature is plotted on the 
abscissa. Solid points have KIC \ogg > 4 ("dwarfs"), while open points have logg < 4 
("giants"). While K-type dwarfs have uncertainties of as little as ~15%, the radius of F- 
and many G-type stars is uncertain by > 100% because of the proximity of the giant and 
dwarf branches. The discontinuity at T e ~ 3900 K is an artifact of the grid of models and 
the sensitivity to very large M giants. 
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Fig. 2. — Observed (uncorrected) distribution of KOI radii (open points and dashed line), 
and a distribution simulating the effects of Eddington bias (filled points and solid line). The 
latter is constructed by adjusting the ratio of each planet candidate by the ratio of the most 
likely stellar radius to every possible radius among stellar models, weighted by a likelihood 
factor (Equation fl~9l) . The two normalized distributions are equal at R p = 3.4i? e . The biased 
distribution has a shallower slope at small radii, a less pronounced bump at Jupiter-size, and 
a higher occurrence of planets larger than the completeness limit. 
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Fig. 3. — Predicted biases in planet occurrence and power-law slope a due to Eddington bias 
for art ifical planets w i th a p ower-law radius distribution placed around Kepler KOI-hosting 



stars. iHoward et al. 



( 120121 ) report that a ~ 2.6 for planets with periods P < 50 d. The 
slope of the scale-free power law distribution is only slightly affected by Eddington bias, but 
the apparent occurrence is biased upwards because more numerous smaller planets appear 



as larger planets due to errors in stellar radius. 
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Fig. 4. — Effect of Malmquist bias on the radii of stars and their planets. The ratio of the 
apparent or "naive" radius to the actual or "bias-i nformed" radiu s of 2061 KOIs is plotted 
vs. the nominal planet radius from the catalog of iBatalhal (120121 ) . (239 others are around 



stars with incomplete parameters from the Kepler Input Catalog). The "naive" radius is 
the mean radius of possible stellar models weighted according to their consistency with KIC 
parameters and priors of mass, age, and metallicity. The "bias- informed" radius is the mean 
calculated using the scaling laws for Malmquist bias derived in Section 12.21 1254 KOIs, and 
the vast majority of planet candidates smaller than Neptune, have predicted bias <10%, but 
many giant "planets" may have radii twice the nominal value and some may be astrophysical 
false positives, i.e. eclipsing stars. 
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Fig. 5. — Predicted bias in metalllicity in transiting planet-selected stars as a consequence 
of the relationships between transit depth, stellar radius, and stellar metallicity. The bias 
was calculated for all Quarter 6 Kepler target stars (regardless of whether or not they host 
KOIs) using Eqns. [2] and [TT] and the methods described in Section 13.11 The upper locus of 
positive values are evolved stars, for which radius decreases with increasing metallicity. 
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Fig. 6. — Radius vs. stellar irradiance of candidate giant planets (8i2 ffi < Rp < 22i? e ) in 
the latest KOI release. These exc l ude t he KOIs listed as false positives or "ambiguous" in 
the Table 1 of iDemory fc Seagerl (120111 ). Each large point represents values based on the 



stellar radius and luminosity of the Dartmouth stellar model that best reproduces the stellar 
parameters from the KIC. The dots represent 200 models chosen randomly from among all 
Dartmouth stellar models that cannot be ruled out at 95.4% (2a) confidence. The vertical 
dotted line demarks the suggested boundary between high and low stellar irradiation regimes. 
Objects above the horizontal dashed line (1.2Rj) are considered "inflated". Objects below 
the dot-dashed line (8i?©) are smaller than reported in the KOI catalog and may have 
problematic Kepler lightcurve analyses. These were not included in the statistical analysis. 
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Fig. 7. — Fraction of "inflated" (R p > 1.2Rj) candidate planets in the low irradiance 
(<2xl0 8 ergs sec -1 cm~ 2 ) regime in 1000 Monte Carlo simulations of the KOI data set 
where planets were randomly shuffled among stars and stellar parameters were resampled 
according to standard errors in the KIC values. All KOIs with R p > 3R® were used to 
generate the artificial planet populations, but only planets with R p > were used in the 
analysis. The vertical dashed line marks the actual number (3). The p value based on this 
distribution is 1.4 x 10~ 3 . 
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Fig. 8. — Distribution of 1000 Kendall r values for the correlation between planet radius 
and stellar irradiance using the same Monte Carlo realiziations of the giant planet KOIs as 
in Figured The vertical dashed line marks the actual value (r = 0.31), corresponding to a 
significane (p value) of 6 x 10~ 4 . 
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Fig. 9. — Ratio of planet radius to star radius of 225 Kepler candidate planets with esti- 
mated radii between 5 and 20 R p vs. metallicity estimates from the Kepler Input Catalog 
metallicities, uncertain by 0.3 dex (Brll). The curve is a running median (n=21). No trend 
with metallicity is apparent. 
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Fig. 10. — Distribution of Kendall r values among 10000 Monte Carlo simulations of the KOI 
data set shown in Figure [9j Host star metallicities and KOI radii are scrambled, removing 
any intrinsic correlation between metallicity and planet radii. The only correlation here is 
due to the increasing radius of stars with metallicity. The dotted line is the value from the 
actual data. 3a significance correspond to r w —0.13. Therefore, there is no significant 
correlation in the real data, and, because of the dependence of stellar radius on metallicity, 
a null sample would contain a significant (p < 0.01) correlation about 60% of the time. 
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Table 1: Parameter values for TRILEGAL 1.5 



Parameter 


Value 




Dust: 






Extinction at oo 


0.0378 




Scale height 


110 pc 




Scale length 


100 kpc 




Position of Sun: 






Galactocentric radius 


8700 pc 




Height above disk 


24.2 pc 




Thin disk: 






Zero-age scale height 


95 pc 




Radial length scale 


2.8 kpc 




Local surface density 


59 M pc~ 2 




Star formation rate 


2-step 




Thick disk: 






Scale height 


800 pc 




Radial length scale 


2.8 kpc 




Local density 


1.5 x 10~ 3 M 


pc 2 


Star formation rate 


11-12 Gyr constant 


Halo: 






Shape 


r 1//4 spheroid 




Scale length 


2.8 kpc 




Oblateness 


0.65 




Local density 


1.5 x KT 4 M 


pc 2 


Star formation rate 


12-13 Gyr 





